Visual saliency and potential field data enhancements: Where is your attention drawn?
نویسندگان
چکیده
Interpretation of gravity and magnetic data for exploration applications may be based on pattern recognition in which geophysical signatures of geologic features associated with localized characteristics are sought within data. A crucial control on what comprises noticeable and comparable characteristics in a data set is how images displaying those data are enhanced. Interpreters are provided with various image enhancement and display tools to assist their interpretation, although the effectiveness of these tools to improve geologic feature detection is difficult to measure. We addressed this challenge by analyzing how image enhancement methods impact the interpreter’s visual attention when interpreting the data because features that are more salient to the human visual system are more likely to be noticed. We used geologic target-spotting exercises within images generated from magnetic data to assess commonly used magnetic data visualization methods for their visual saliency. Our aim was achieved in two stages. In the first stage, we identified a suitable saliency detection algorithm that can computationally predict visual attention of magnetic data interpreters. The computer vision community has developed various image saliency detection algorithms, and we assessed which algorithm best matches the interpreter’s data observation patterns for magnetic target-spotting exercises. In the second stage, we applied this saliency detection algorithm to understand potential visual biases for commonly used magnetic data enhancement methods. We developed a guide to choosing image enhancement methods, based on saliency maps that minimize unintended visual biases in magnetic data interpretation, and some recommendations for identifying exploration targets in different types of magnetic data. Introduction Interpretation of magnetic or any other contourmapped representation of geoscience data is primarily based on a pattern recognition process in which anomalies indicating geologic features are sought within the data and their spatial associations analyzed. It is common practice to process or enhance the data prior to display to bring out characteristics perceived to be useful to the interpreter. The combination of highand/or low-pass filtering (LPF), color contour mapping, and sun-angle shading is widely used by potential field data interpreters. Conventionally, interpreters select different data enhancement methods (Blakely, 1995) based on their prior knowledge of these methods or by trial and error to enhance specific features of interest or data characteristics. In practice, it is common that interpreters use multiple enhancement methods, for example, using highand LPF to bring out anomalies associated with causative sources at different depths or using multiple different highpass filters to find discontinuities within data. In addition, images are also visualized using different color display and shading methods. Welland et al. (2006) report the impact of human visual perception of colors on seismic data interpretation. Even though the findings are not described in detail, their study is based on the nonlinear nature of human color perception, in which the same amount of change in different bands in the visual spectrum, such as yellow and blue in an image, are not perceived as the same change by the interpreter. To address this, they propose a modified color bar to compensate for visual bias in the interpretation of seismic data. For potential field data, we previously reported the impact of human data interactions on geologic target-spotting (Sivarajah et al., 2013). This study shows that the viewing of data in two different orientations and carrying out a systematic target search impact the target-spotting performance. Evidently, how we view and interact with data plays a significant role in data interpretation. In the fields of psychology and computer vision, there has been active research on understanding and University of Western Australia, Centre for Exploration Targeting, Perth, Australia. E-mail: [email protected]; [email protected]; [email protected]; [email protected]. University of Western Australia, Electrical, Electronic and Computer Engineering, Perth, Australia. E-mail: [email protected]. Manuscript received by the Editor 30 December 2013; revised manuscript received 14 March 2014; published online 6 August 2014. This paper appears in Interpretation, Vol. 2, No. 4 (November 2014); p. T155–T167, 12 FIGS., 5 TABLES. http://dx.doi.org/10.1190/INT-2013-0199.1. © 2014 Society of Exploration Geophysicists and American Association of Petroleum Geologists. All rights reserved. t Technical Papers Interpretation / November 2014 T155 D ow nl oa de d 08 /1 3/ 14 to 1 30 .9 5. 19 8. 42 . R ed is tr ib ut io n su bj ec t t o SE G li ce ns e or c op yr ig ht ; s ee T er m s of U se a t h ttp :// lib ra ry .s eg .o rg / emulating human visual attention. In our visual and other sensory systems, a key attention mechanism is saliency: a quality that makes certain items (objects, faces, sounds, etc.) “stand out” from their surroundings. Thus, visual saliency is typically associated with contrast from neighbors, such as a bright object within a dark image background, and is called the bottom-up influence. Visual saliency can also be influenced by memory or anticipatory mechanisms through training, for example, identifying your child’s face in a school group photograph or looking at moving cars when crossing the road. This is called the top-down influence. In psychology, human attention has been modeled using the bottom-up and top-down influences including the learning of attention prioritization using these influences (van de Laar et al., 1997). The computer vision community, on the other hand, focuses on emulating the bottom-up influence computationally using saliency detection algorithms (Itti et al., 1998; Harel et al., 2006). There are many algorithms developed to identify image saliency. These are based on (1) a biological model using spatial contrasts in color, intensity, and orientation (Itti et al., 1998), (2) purely computational approaches using frequency analysis (Achanta et al., 2008, 2009; Achanta and Süsstrunk, 2010), or (3) a combination of the two (Harel et al., 2006). Visual attention maps computed using these algorithms are called saliency maps. In previous work, saliency maps have been used for various applications, such as scene classification (Siagian and Itti, 2007), text detection (Sun et al., 2010), object detection (Walther et al., 2002), visual search (Elazary and Itti, 2010), and automatic seam line detection for the merging of optical remote-sensing images (Yu et al., 2012). In another study, Su et al. (2004) investigate the possibility of using the inverted saliency model for display enhancement of natural images. We present a novel study of human attention based on saliency models for the task of analyzing interpreter biases. We aimed to determine whether saliency maps can effectively represent interpreters’ visual attention for magnetic data and then be used to understand potential biases in data observation when interpreting the data using different visualization methods. This research was conducted in two stages. In the first stage, we compared interpreters’ visual attention maps with saliency maps generated from three widely known saliency detection algorithms. The interpreters’ visual attention was determined by identifying eye gaze fixation locations. Fixation is defined as maintaining eye gaze at a particular location for at least 100–150 ms (Viviani, 1990). As visual attention moves to a new location, the eye gaze will try to follow (Deubel and Schneider, 1996) and typically fixate on locations that an individual finds to be surprising, salient, or significant (Loftus and Mackworth, 1978). To capture this information, we carried out a target-spotting experiment and the interpreters’ eye gaze movements were acquired using an eye tracker system (ETS). In this experiment, the task was to identify responses associated with porphyry-style mineralization within magnetic data. Our preliminary studies (Chadwick et al., 2010; Sivarajah et al., 2012) demonstrated the feasibility of capturing ETS data to monitor and analyze the human data interactions during target-spotting exercises on magnetic and seismic data sets. For this study, we used two separate target-spotting exercises. In the first exercise, we displayed small-scale images, each containing either a single target or background noise. In the second exercise, we displayed a large-scale image containing multiple targets. A set of saliency maps was generated from the magnetic images using different saliency algorithms. These saliency maps were then compared with the eye-tracking results, and the saliency algorithm that generated the saliency maps that had the closest match to the interpreters’ data observation was identified. Previously, researchers have used a similar approach to demonstrate the correlation of saliency maps and interpreter-data interactions using ETS for natural images (Harel et al., 2006; Li et al., 2013). However, such analysis has not been conducted to date for geoscientific data interpretation. In the second stage, we applied the selected saliency algorithm to predict how widely used magnetic data enhancement methods will impact human visual attention during interpretation. The regions in the data likely to attract visual attention were highlighted using the selected saliency detection algorithm from the first stage, revealing potential unintended visual biases. When a region without an anomaly/target attracts the visual attention, it is considered as unintended visual attention. This saliency analysis was performed based on the assumption that a target can be more easily identified if it is located within a region that attracts interpreter attention than if it is located in a region that does not draw interpreter attention. We propose that saliency maps can be used to guide the selection of enhancement methods to reduce these unintended visual biases by identifying the enhancement methods that produced dissimilar and complementary saliency maps. Potentially, such insight can also assist in the design of new data enhancement and filtering methods. In this paper, we report the experimental details, ETS data capture and processing, and the selection of the most suitable saliency detection algorithm. Then we present the analysis of the interpreter biases using the selected saliency detection algorithm to evaluate commonly used enhancement methods and the limitations and applicability of the findings. Finally, we discuss our conclusions and ongoing research. We provide a list of abbreviations used in this paper in Appendix A. Interpreter visual attention versus image saliency Our study analyzes the effectiveness of saliency maps in predicting interpreters’ visual attention and then selects the most suitable saliency detection algorithm for magnetic data. The interpreter visual attention maps were captured through an experiment requiring participants to recognize targets that have characteristics T156 Interpretation / November 2014 D ow nl oa de d 08 /1 3/ 14 to 1 30 .9 5. 19 8. 42 . R ed is tr ib ut io n su bj ec t t o SE G li ce ns e or c op yr ig ht ; s ee T er m s of U se a t h ttp :// lib ra ry .s eg .o rg / suggestive of gold-copper-rich porphyry systems. The relevant magnetic anomalies have a distinctive “Mexican-hat”-like character comprising subcircular magnetic highs with surrounding annular lows (Holden et al., 2011; Hoschke, 2011) as shown in Figure 5a–1. All the interpreters who participated in this study were trained geophysicists or geologists with experience in magnetic data interpretation and had normal or corrected-to-normal vision (i.e., using contact lenses). The survey used in this experiment is over a mature exploration area that contains several known deposits confirmed by field drill tests. Figure 1 shows that the main porphyry belt runs from the top left corner to the bottom right (outlined) and consists of porphyritic intrusions with dacitic, granodioritic, quartz dioritic, and dioritic compositions. Volcaniclastic and pyroclastic breccias are present, along with shale-siltstone, sandstone, minor volcanic rocks, and mafic to intermediate dikes. The strong positive circular to elliptical magnetic responses correspond to porphyry-style deposits. Ground magnetic data were collected with a line spacing of 100 m and gridded with a 25-m cell size. The data have been upward continued to 50 m to suppress noisy short wavelength responses, which mostly originate from the near surface. Finally, the data were reduced to the pole (RTP) to give symmetrical responses and center the anomaly peak over the center of the porphyritic intrusions. The magnetic image was illuminated with a false sun located in the north side of the region covered by the data at an inclination of 45°. The location of the study area has been withheld due to agreed commercial confidentiality. Experiment setup Participants were seated in front of a display monitor (52 × 33 cm) at a convenient distance (from 60 to 100 cm) and were then fitted with ETS glasses to capture their eye gaze movements (Figure 2). To maximize the participants’ engagement with the targetspotting task, they were requested to respond to targets by pressing a key on a keyboard as soon as they spotted an anomaly likely to indicate a porphyry deposit (which we term a porphyry). Participants were requested to perform two different exercises during this experiment to capture the data observations patterns: spotting targets within small-scale images and within a large-scale image. Written instructions were displayed on the monitor at the beginning of each exercise. For exercise 1, the magnetic image was cropped to small images with porphyries, i.e., “target” images, and without porphyries, i.e., “nontarget” images (Figure 5, top row). These target images were obtained from the regions where known deposits are located, and the nontarget images were selected from regions where there were no significant porphyry-style anomalies. These target and nontarget images were displayed in a rapid fashion to six participants for target identification. In the visual display, target images and nontarget images were displayed on the center of the monitor (within the 23.4 × 17.4 cm area). These images were shown in a random sequence, but the sequence was identical for all participants. We displayed the images for 1000 ms with an interimage interval of 1000 ms in which a blank screen was shown. In exercise 2, the magnetic image with multiple targets (Figure 6a) was displayed on the entire monitor for 3 min. Fourteen interpreters participated in this Figure 1. Magnetic data with multiple porphyry-style mineralization and the arrow indicating the main porphyry belt. Note the Mexican-hat geometry of the anomalies. Data are courtesy of Barrick Gold of Australia Ltd. Figure 2. Experiment participant wearing eye-tracker glasses. Interpretation / November 2014 T157 D ow nl oa de d 08 /1 3/ 14 to 1 30 .9 5. 19 8. 42 . R ed is tr ib ut io n su bj ec t t o SE G li ce ns e or c op yr ig ht ; s ee T er m s of U se a t h ttp :// lib ra ry .s eg .o rg / exercise, and they were requested to identify as many porphyries as possible within that time. Eye-tracker data acquisition and processing This study used a mobile eye tracker available from Applied Science Laboratories. The eye tracker uses two video cameras and three infrared light-emitting diodes (LEDs), which are mounted on a pair of standard safety glasses. There is a circular cutout in the right lens of the glasses. This cutout allows for the placement of an adjustable monocle that reflects the infrared light beam from the LEDs (which are arranged in a triangular pattern) onto the eye surface. Eye gaze is determined by comparing the reflected infrared light from the cornea and the pupil, which are captured by the first camera. A forward facing second camera records the interpreter’s field of view (FOV). The eye tracker needs to be calibrated for every subject to enable accurate calculation of individual eye gaze coordinates. Calibration is achieved by requesting that the participants fix their gaze on known locations and marking those points on the FOV video frame. We used 13 points to cover the entire monitor. The ETS records the FOV camera video frames together with the locations of the eye gaze with respect to the FOV. The eye gaze locations with respect to the displayed data region were calculated by analyzing each video frame to identify the location of the displayed data region (the corners of the monitor) with respect to the FOV video frame. An image processing algorithm was developed to calculate eye gaze coordinates on the displayed data region. The algorithm first corrects the known barrel distortion introduced by the camera lens (Figure 3a) and transforms the FOV image into a perfect perspective projection image (Figure 3b). It then identifies the boundaries of the data region using edge detection and the Hough transform (Figure 3c). Based on the boundaries, the algorithm calculates the four corners of the rectangular data region within the FOV (Figure 3d). Finally, it transforms the eye gaze coordinates from the FOV frame to the image/data frame (Figure 3e) using 2D homography (Hartley and Zisserman, 2003). Figure 4a shows the data observation pattern generated as a track plot using the calculated eye gaze locations. The interpreter eye gaze fixation locations were identified using the calculated eye gaze locations with respect to the displayed images (Figure 4b). There are various methods used to identify the fixation locations from the calculated eye gaze locations. We implemented the algorithm described in Goldberg and Kotval (1999), which identifies eye gaze fixations based on continuous observation at a particular location (within a 40-pixel-radius area) for at least 100 ms. Based on the calculated fixation locations for each participant for each of the observed images, we generated the fixation maps by placing a Gaussian smoothed circle on the locations of the fixations (with a radius of 40 pixels). We then obtained the interpreter visual attention “heat maps” for the magnetic images by pixelwise averaging of these fixation maps across subjects (Figure 4c). The averaged fixation map represents the accumulated fixations of all the participants. Image saliency algorithms The image saliency detection algorithms used in this study were selected to represent three different approaches in modeling human attention, which are based on biological approaches, purely computational approaches, or a combination of the two. We selected widely known image saliency algorithms in these categories, namely, the visual-attention-model-based algorithm Figure 3. (a) FOV frame from the eye tracker with the red cross indicating the location of the eye gaze, (b) lens distortion corrected image, (c) detected edges of the monitor screen on the FOV frame, (d) calculated corners represented by four red crosses, and (e) computed eye gaze location (red cross) on the displayed image. Figure 4. (a) Interpreter data observation pattern plotted on top of the displayed data as a track plot. (b) Blue dots on the displayed images indicate the fixation location calculated from the data observation data; RTP magnetic intensity changes from blue to pink as the value increases are shown by the color bar. (c) Visual attention heat map; the saliency increases as color changes from blue to red are shown by the color bar. T158 Interpretation / November 2014 D ow nl oa de d 08 /1 3/ 14 to 1 30 .9 5. 19 8. 42 . R ed is tr ib ut io n su bj ec t t o SE G li ce ns e or c op yr ig ht ; s ee T er m s of U se a t h ttp :// lib ra ry .s eg .o rg / by Itti et al. (1998) referred to here as ITTI, the hypercomplex Fourier transform (HFT)method (Li et al., 2013), and the graph-based visual saliency (GBVS) approach (Harel et al., 2006). All of the leading saliency detection models are based on the following three steps (Harel et al., 2006): 1) extraction of image features/feature maps 2) generation of activation maps/conspicuity maps 3) obtaining the saliency map through selection/normalization. The ITTI method adapts the saliency-based visual attention model of Koch and Ullman (1985). In this model, the visual input of the human vision system is first processed in parallel to generate a set of image features for different channels, such as color, intensity, and orientation, across multiple spatial scales. The feature maps for these different channels are calculated based on the center-surround differences, by taking the difference between the smaller and larger scale image features. These feature maps at different scales are then combined and normalized, resulting in three conspicuity maps representing color, intensity, and orientation separately. Finally, these three conspicuity maps are combined using equal weights to generate the saliency map. The HFT method considers the saliency detection as a frequency domain problem and defines a concept of nonsaliency using global information. In this method, feature maps are computed based on color, intensity, and motion. The amplitude spectrum, phase spectrum, and eigenaxis spectrum are computed from the feature maps. Spikes in the amplitude spectrum correspond to repeated patterns (nonsalient regions) in the spatial domain. These repeated patterns are smoothed with Gaussian kernels to suppress the nonsalient regions. The saliency map at each scale is derived using the smoothed amplitude spectrum and the original phase and eigenaxis spectrum. The final saliency map is selected by choosing the best scale with minimal saliency map entropy. In the GBVS method, the image features are calculated using the ITTI method, but the activation and normalization steps are implemented using a graph-based approach. This is achieved by joining all the nodes (pixels) of the featuremaps to generate a fully connected directed graph (Bang-Jensen and Gutin, 2008). The directed edges are assigned with a weight, which is proportional to the dissimilarity between the end nodes of the edges and to their closeness. AMarkov chain (Norris, 1998) is defined on the graph to estimate the equilibrium distribution, and an activation measure is obtained from pairwise contrast. The normalization step is performed on these activation maps by using another Markovian process on the graph that is constructed from the activation map to generate the saliency map. Selection of the most suitable saliency detection algorithm We compare the saliency maps generated using the three algorithms with the participants’ visual attention maps to identify the saliency detection algorithm that can suitably predict areas that will draw interpreters’ attention. From exercise 1, we used eight target and eight nontarget images for this analysis. The nontarget images used for this analysis were selected based on the presence of some kind of magnetic anomaly within the displayed region. Note that some of the nontarget images displayed during exercise 1 did not have any anomalies. We generated the interpreter visual attention maps for these 16 images (maps obtained for four target and four nontarget images are shown in Figure 5b) and for the large-scale magnetic image displayed in exercise 2 (Figure 6b). The interpreter visual attention maps were thresholded to obtain binary images, which were used as the ground truth saliency regions for the identification of the most suitable saliency detection algorithm. We generated saliency maps from the 16 magnetic images used for this analysis from exercise 1 (Figure 5) and the magnetic image used in exercise 2 (Figure 6) using ITTI, GBVS, and HFT saliency detection algorithms. The performance of these saliency detection algorithms in identifying the interpreter visual attention was calculated by analyzing how well these saliency maps match with the ground truth saliency regions. This is achieved by obtaining the binary images of the saliency maps generated using the algorithms by varying the threshold (from zero to one) and comparing them pixelwise with the ground truth saliency regions. Based on this comparison, the true positive rate (TPR) and the false positive rate (FPR) were calculated (equations 1 and 2) by assuming the salient regions as the “target” and the nonsalient regions as the “background”:
منابع مشابه
Graph-based Visual Saliency Model using Background Color
Visual saliency is a cognitive psychology concept that makes some stimuli of a scene stand out relative to their neighbors and attract our attention. Computing visual saliency is a topic of recent interest. Here, we propose a graph-based method for saliency detection, which contains three stages: pre-processing, initial saliency detection and final saliency detection. The initial saliency map i...
متن کاملCompressed-Sampling-Based Image Saliency Detection in the Wavelet Domain
When watching natural scenes, an overwhelming amount of information is delivered to the Human Visual System (HVS). The optic nerve is estimated to receive around 108 bits of information a second. This large amount of information can’t be processed right away through our neural system. Visual attention mechanism enables HVS to spend neural resources efficiently, only on the selected parts of the...
متن کاملJust Noticeable Difference Estimation Using Visual Saliency in Images
Due to some physiological and physical limitations in the brain and the eye, the human visual system (HVS) is unable to perceive some changes in the visual signal whose range is lower than a certain threshold so-called just-noticeable distortion (JND) threshold. Visual attention (VA) provides a mechanism for selection of particular aspects of a visual scene so as to reduce the computational loa...
متن کاملA Novel Approach to Background Subtraction Using Visual Saliency Map
Generally human vision system searches for salient regions and movements in video scenes to lessen the search space and effort. Using visual saliency map for modelling gives important information for understanding in many applications. In this paper we present a simple method with low computation load using visual saliency map for background subtraction in video stream. The proposed technique i...
متن کاملReduced-Reference Image Quality Assessment based on saliency region extraction
In this paper, a novel saliency theory based RR-IQA metric is introduced. As the human visual system is sensitive to the salient region, evaluating the image quality based on the salient region could increase the accuracy of the algorithm. In order to extract the salient regions, we use blob decomposition (BD) tool as a texture component descriptor. A new method for blob decomposition is propos...
متن کاملA Saliency Detection Model via Fusing Extracted Low-level and High-level Features from an Image
Saliency regions attract more human’s attention than other regions in an image. Low- level and high-level features are utilized in saliency region detection. Low-level features contain primitive information such as color or texture while high-level features usually consider visual systems. Recently, some salient region detection methods have been proposed based on only low-level features or hig...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014